Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts

نویسندگان

  • Brahim Djioua
  • Jean-Pierre Desclés
چکیده

The basic aim of the model proposed here is to automatically build semantic metatext structure for texts that would allow us to search and extract discourse and semantic information from texts indexed in that way. This model is built up from two engines: The first engine, called EXCOM (Djioua et al., 2006), is an XML based system for an automatic annotation of texts according to discourse and semantic categories. The second engine called MOCXE uses automatic semantic annotation that is generated by EXCOM to create a semantic inverted index which is able to find relevant documents for queries associated with discursive and semantic categories such as definition, quotation, causality, relations between concepts, etc. We explain by an example of a relation of “connection” between concepts in French. The model used is enough general to be translated in other languages. General presentation Current existing web search engine systems that index texts generate representations as a set of simple and complex

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Semantic Subject Indexing of Web Documents in Highly Inflected Languages

Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly infle...

متن کامل

Automatic Semantic Subject Indexing of Web Documents in Highly In ected Languages

Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly in ec...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

A Semi-Automatic Approach of old Arabic Documents Indexing

indexing is a largely used technique in retrieval systems. It has as goal to extract and to represent the meaning of a document so that it can be found by the user. We can cite two types of indexing: manual indexing, and automatic indexing. The automatic indexing requires to use character and words recognition engines which work only over the texts of contemporary documents. In this paper, we p...

متن کامل

Semantic Indexing and Typed Hyperlinking

In this paper, we describe linguistically sophisticated tools for the automatic annotation and navigation of on-line documents. Creation of these tools relies on research into finite-state technologies for the design and development of lexicallyintensive semantic indexing, shallow semantic understanding, and content abstraction techniques for texts. These tools utilize robust language processin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007